AITopics | silent speech

Collaborating Authors

silent speech

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Silent Speech Decoding System from EEG and EMG with Heterogenous Electrode Configurations

Inoue, Masakazu, Sato, Motoshige, Tomeoka, Kenichi, Nah, Nathania, Hatakeyama, Eri, Arulkumaran, Kai, Horiguchi, Ilya, Sasai, Shuntaro

arXiv.org Artificial IntelligenceJun-18-2025

However, data collection is difficult and performed using varying experimental setups, making it nontrivial to collect a large, homogeneous dataset. In this study we introduce neural networks that can handle EEG/EMG with heterogeneous electrode placements and show strong performance in silent speech decoding via multi-task training on large-scale EEG/EMG datasets. We achieve improved word classification accuracy in both healthy participants (95.3%), and a speech-impaired patient (54.5%), substantially outperforming models trained on single-subject data (70.1% and 13.2%). Moreover, our models also show gains in cross-language calibration performance. This increase in accuracy suggests the feasibility of developing practical silent speech decoding systems, particularly for speech-impaired patients.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2506.13835

Country:

Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
North America > United States (0.04)
Asia > Japan (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)

Add feedback

Pretraining Large Brain Language Model for Active BCI: Silent Speech

Zhou, Jinzhao, Cao, Zehong, Duan, Yiqun, Barkley, Connor, Leong, Daniel, Jiang, Xiaowei, Nguyen, Quoc-Toan, Zhao, Ziyi, Do, Thomas, Chang, Yu-Cheng, Liang, Sheng-Fu, Lin, Chin-teng

arXiv.org Artificial IntelligenceMay-6-2025

This paper explores silent speech decoding in active brain-computer interface (BCI) systems, which offer more natural and flexible communication than traditional BCI applications. We collected a new silent speech dataset of over 120 hours of electroencephalogram (EEG) recordings from 12 subjects, capturing 24 commonly used English words for language model pretraining and decoding. Following the recent success of pretraining large models with self-supervised paradigms to enhance EEG classification performance, we propose Large Brain Language Model (LBLM) pretrained to decode silent speech for active BCI. To pretrain LBLM, we propose Future Spectro-Temporal Prediction (FSTP) pretraining paradigm to learn effective representations from unlabeled EEG data. Unlike existing EEG pretraining methods that mainly follow a masked-reconstruction paradigm, our proposed FSTP method employs autoregressive modeling in temporal and frequency domains to capture both temporal and spectral dependencies from EEG signals. After pretraining, we finetune our LBLM on downstream tasks, including word-level and semantic-level classification. Extensive experiments demonstrate significant performance gains of the LBLM over fully-supervised and pretrained baseline models. For instance, in the difficult cross-session setting, our model achieves 47.0\% accuracy on semantic-level classification and 39.6\% in word-level classification, outperforming baseline methods by 5.4\% and 7.3\%, respectively. Our research advances silent speech decoding in active BCI systems, offering an innovative solution for EEG language model pretraining and a new dataset for fundamental research.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2504.21214

Country: North America > United States (0.30)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

A Cross-Modal Approach to Silent Speech with LLM-Enhanced Recognition

Benster, Tyler, Wilson, Guy, Elisha, Reshef, Willett, Francis R, Druckmann, Shaul

arXiv.org Artificial IntelligenceMar-2-2024

Silent Speech Interfaces (SSIs) offer a noninvasive alternative to brain-computer interfaces for soundless verbal communication. We introduce Multimodal Orofacial Neural Audio (MONA), a system that leverages cross-modal alignment through novel loss functions--cross-contrast (crossCon) and supervised temporal contrast (supTcon)--to train a multimodal model with a shared latent representation. This architecture enables the use of audio-only datasets like LibriSpeech to improve silent speech recognition. Additionally, our introduction of Large Language Model (LLM) Integrated Scoring Adjustment (LISA) significantly improves recognition accuracy. Together, MONA LISA reduces the state-of-the-art word error rate (WER) from 28.8% to 12.2% in the Gaddy (2020) benchmark dataset for silent speech on an open vocabulary. For vocal EMG recordings, our method improves the state-of-the-art from 23.3% to 3.7% WER. In the Brain-to-Text 2024 competition, LISA performs best, improving the top WER from 9.8% to 8.9%. To the best of our knowledge, this work represents the first instance where noninvasive silent speech recognition on an open vocabulary has cleared the threshold of 15% WER, demonstrating that SSIs can be a viable alternative to automatic speech recognition (ASR). Our work not only narrows the performance gap between silent and vocalized speech but also opens new possibilities in human-computer interaction, demonstrating the potential of cross-modal approaches in noisy and data-limited regimes.

recognition, speech recognition, wer, (15 more...)

arXiv.org Artificial Intelligence

2403.05583

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.93)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Improving the Gap in Visual Speech Recognition Between Normal and Silent Speech Based on Metric Learning

Kashiwagi, Sara, Tanaka, Keitaro, Feng, Qi, Morishima, Shigeo

arXiv.org Artificial IntelligenceOct-16-2023

This paper presents a novel metric learning approach to address the performance gap between normal and silent speech in visual speech recognition (VSR). The difference in lip movements between the two poses a challenge for existing VSR models, which exhibit degraded accuracy when applied to silent speech. To solve this issue and tackle the scarcity of training data for silent speech, we propose to leverage the shared literal content between normal and silent speech and present a metric learning approach based on visemes. Specifically, we aim to map the input of two speech types close to each other in a latent space if they have similar viseme representations. By minimizing the Kullback-Leibler divergence of the predicted viseme probability distributions between and within the two speech types, our model effectively learns and predicts viseme identities. Our evaluation demonstrates that our method improves the accuracy of silent VSR, even when limited training data is available.

silent speech, speech, speech data, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.21437/Interspeech.2023-370

2305.14203

Country:

Asia > Japan (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.87)

Add feedback

Keep those lips sealed! Tech companies can now read them

#artificialintelligenceAug-1-2021, 21:21:55 GMT

"Start navigation, please," said the driver in the car with noisy passengers. Within seconds, a speech recognition system identified the command and activated the navigation system simply by reading the driver's lips. In another instance, a patient in a hospital with breathing tubes placed below their vocal cords, finds it difficult to speak. The helper uses SRAVI – a mobile app that uses Liopa's lip-reading technology – to scan the patient's face while they silently mouth a sentence. The Artificial Intelligence (AI)-assisted system then displays three probable statements of what the patient may be trying to say.

lip movement, prof shah, tech company, (13 more...)

#artificialintelligence

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.06)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.73)
Information Technology > Artificial Intelligence > Vision (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)
(2 more...)

Add feedback

UC Berkeley Researchers Use AI For Digital Voicing Of Silent Speech

#artificialintelligenceNov-27-2020, 04:45:04 GMT

Researchers at UC Berkeley have developed an AI model that detects'silent speech.' The model is based on digital voicing to predict words and generate synthetic speech. Electromyography (EMG), with electrodes located at the face and throat, is used to detect the silent speech. Researchers assert that the model can enable many applications for people who cannot produce audible speech and assist speech detection for AI tools and additional devices that respond to voice commands. The team states that digitally voicing silent speech has broad applications.

digital voicing, silent speech, uc berkeley researcher use ai, (4 more...)

#artificialintelligence

Country: Asia > China (0.07)

Industry: Education > Educational Setting > Higher Education (0.63)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback